Recurrence-free survival as a surrogate endpoint for overall survival after neoadjuvant chemotherapy and surgery for oesophageal squamous cell carcinoma

Abstract Background Overall survival is considered as one of the most important endpoints of treatment efficacy but often requires long follow-up. This study aimed to determine the validity of recurrence-free survival as a surrogate endpoint for overall survival in patients with surgically resectable advanced oesophageal squamous cell carcinoma (OSCC). Methods Patients with OSCC who received neoadjuvant cisplatin and 5-fluorouracil, or docetaxel, cisplatin and 5-fluorouracil, at 58 Japanese oesophageal centres certified by the Japan Esophageal Society were reviewed retrospectively. The correlation between recurrence-free and overall survival was assessed using Kendall's τ. Results The study included 3154 patients. The 5-year overall and recurrence-free survival rates were 56.6 and 47.7% respectively. The primary analysis revealed a strong correlation between recurrence-free and overall survival (Kendall's τ 0.797, 95% c.i. 0.782 to 0.812) at the individual level. Subgroup analysis showed a positive relationship between a more favourable pathological response to neoadjuvant chemotherapy and a higher τ value. In the meta-regression model, the adjusted R2 value at the institutional level was 100 (95% c.i. 40.2 to 100)%. The surrogate threshold effect was 0.703. Conclusion There was a strong correlation between recurrence-free and overall survival in patients with surgically resectable OSCC who underwent neoadjuvant chemotherapy, and this was more pronounced in patients with a better response to neoadjuvant chemotherapy.


Introduction
Worldwide, oesophageal cancer ranks seventh and sixth in cancer incidence and mortality respectively 1 .As oesophageal cancer metastasizes already at early disease stage 2,3 , multimodal treatment is required 4,5 .In Western countries, preoperative chemoradiotherapy or perioperative chemotherapy in combination with transthoracic oesophagectomy is standard of care 6,7 .In Japan, neoadjuvant chemotherapy followed by oesophagectomy with radical lymph node dissection is indicated for oesophageal squamous cell carcinoma (OSCC) 8 .After treatment with curative intent, the risk of recurrence remains high.Hence, adjuvant systemic therapies including the role of https://doi.org/10.1093/bjs/znae038Original Article immune checkpoint inhibitors are currently being investigated 9,10 .
Overall survival (OS) is considered as one of the most important endpoints of treatment efficacy in RCTs.A disadvantage of using OS as the primary endpoint is that it often requires a long follow-up time and large trial populations to detect statistically significant and clinically meaningful differences between study arms.Another disadvantage is that it is potentially affected by non-cancer causes of death and advances in treatment of recurrent or advanced disease.Therefore, statistically appropriate and clinically relevant surrogate endpoints should be explored.
Disease-free survival (DFS), progression-free survival (PFS), and recurrence-free survival (RFS) have been investigated in colorectal, breast, lung, and gastric cancers [11][12][13][14] .However, few studies have examined the use of these endpoints in oesophageal cancer.Kataoka et al. 15 showed that PFS was not an appropriate surrogate endpoint for OS using data from 10 clinical trials in oesophageal cancer.On the contrary, Ajani et al. 16 demonstrated in a literature-based study that the HRs for DFS and PFS correlated with those for OS, and concluded that both reflect OS.However, both studies used aggregated data such as HRs and did not analyse individual-patient data.Studies using individual-patient data are more robust because trial-level correlations and individual-level correlations may not be consistent 17 .
This primary aim of this study was to evaluate RFS as a surrogate endpoint for OS in patients who underwent surgery after neoadjuvant chemotherapy for OSCC.The impact of pCR rate on the association between RFS and OS was also assessed.

Study design
This retrospective, multicentre observational study was conducted across 58 Japanese hospitals recognized by the Authorized Institute for Board Certified Esophageal Surgeons by the Japan Esophageal Society.The study was approved by Keio University School of Medicine Ethics Committee and by all the participating centres (Ethics Approval Number 20231069).The study was performed in accordance with the Declaration of Helsinki.The need for informed consent was waived owing to the retrospective nature of the study.This study adhered to the STROBE guidelines 18 .

Patient selection
The study included patients with OSCC who underwent subtotal oesophagectomy between 2010 and 2015.Patients with clinical stage I, II, III (excluding cT1 N0 and cT4b) or IV OSCC based on supraclavicular lymph node metastases, and who underwent neoadjuvant chemotherapy with DCF (docetaxel, cisplatin, and 5-fluorouracil (5-FU)) or CF (cisplatin plus 5-FU) were included.Patients undergoing salvage oesophagectomy after definitive chemoradiotherapy were excluded.

Data collection and definitions
Information on patient characteristics, clinicopathological factors, and surgical procedures was collected retrospectively from each hospital.Clinical stage before treatment was determined via oesophagogastroduodenoscopy (OGD) and CT.Based on the data, the extent of tumour spread was reassessed using the eighth edition of the TNM classification established by the UICC 19 .Primary tumours were examined to evaluate the histological response to preoperative treatment in accordance with the Japanese Classification of Esophageal Cancer 20,21 .This classification scheme includes five grades: grade 0, no tumour response; grade 1a, necrotic or fibrotic change observed in less than one-third of the tumour; grade 1b, necrotic or fibrotic change observed in between one-and two-thirds of the tumour; grade 2, more than two-thirds of the tumour is necrotic or fibrotic; and grade 3, no viable tumour cells.

Treatment
Surgery entailed a transthoracic oesophagectomy with right thoracotomy and gastric tube reconstruction via the posterior mediastinal or retrosternal route and a two-or three-field lymph node dissection 22,23 .This has been the standard curative surgical procedure since before 2010 in Japan.Mediastinal lymph nodes with bilateral recurrent nerve and abdominal lymph nodes were dissected routinely, including the paracardial lymph nodes and lymph nodes along the lesser curvature and left gastric artery.Additionally, supraclavicular lymph node dissection was performed if the primary tumour was situated between the upper and mid-thoracic oesophagus.All patients with stage IVB disease had supraclavicular lymph node metastases and underwent three-field lymph node dissection as recommended for these patients 20,24 .Two courses of CF chemotherapy every 3 weeks was the standard preoperative treatment at most centres, in accordance with the JCOG 9907 study in Japan 8 .Treatment with three courses of DCF was an alternative treatment option mostly administered every 3 weeks 25 .
Postoperative follow-up included OGD and CT every 4-6 months annually until 5 years after operation.

Statistical analysis
OS was calculated from the date of surgery until day of death or last follow-up.RFS was calculated from the date of surgery until the day of death, recurrence, or last follow-up.At the individual level, Kendall's τ was employed to assess surrogacy between RFS and OS.Kendall's τ is a rank correlation coefficient ranging from −1 to 1, with values closer to 1 indicating a higher correlation.The association between the true endpoint (OS) and the surrogate endpoint (RFS) was evaluated using the following four methods.As a primary analysis, the illness-death model-based method 26 was used to estimate Kendall's τ between RFS and OS, taking into account the effects of competing risks.Simulation results showed that the illness-death model-based method performed well across several scenarios 26 .The survival process in patients with cancer can be represented using a three-state illness-death model, with states corresponding to before recurrence, recurrence, and death.Estimating the correlation between the two failure time endpoints involved modelling the transition intensities between these states.As secondary analyses, the two-step method was used (a bivariable model based on the Clayton copula combined with the trial-specific Weibull model) 27 , the joint frailty-copula model based on the Clayton copula 28 , and the non-parametric inverse probability of censoring weighting (IPCW) method 29 .The former two methods measure dependence structures between RFS and OS using copula models, which are used widely for modelling failure time endpoints.The IPCW method is an extension of Kendall's τ estimation method for survival time outcomes that accounts for the probability of censoring.As a subgroup analysis, Kendall's τ was estimated using the illness-death model-based method for each pathological grade (0, 1a, 1b, 2, 3), and for patients who received adjuvant chemotherapy and those who did not.As an exploratory analysis, surrogacy between each short-term postoperative endpoint (pCR or pathological grade) and OS was investigated using novel statistical methods.τ was estimated using a modified IPCW estimator to uncensored binary variables 29 .The modified method adjusts for tie data that occur when the IPCW method is applied to a survival time outcome and a categorical outcome.Moreover, C-index was estimated to examine the ability of pCR or pathological grade to discriminate OS 30 .
At the institutional level, the coefficient of determination, R 2 , between the natural logarithm of the age-adjusted HRs for RFS and OS was used to assess surrogacy.HRs were calculated using DCF-treated patients as the treatment group and CF-treated patients as the control group.For estimating adjusted R 2 , a meta-regression model was used that accounted for the sample size and HR variability across hospitals by using the generic inverse-variance method 31 .Furthermore, the surrogate threshold effect (STE) was estimated, indicating the minimum RFS treatment effect required to predict a non-zero effect on OS 32 .In future trials, to predict a non-zero OS effect, the upper limit of the prediction interval for the estimated HR for RFS should be lower than the STE.R version 4.3.0(R Foundation for Statistical Computing, Vienna, Austria) was used, and the packages surrosurv, joint.Cox, metagen, metafor, dynpred, and survC1 were employed to implement the methods described above 27,28,31,33,34 .The 95% confidence interval for τ was obtained by the surrosurv package and the 95% confidence interval for R 2 was estimated by the bootstrap method.

Results
In total, 3154 patients were included.Figure S1 shows the study flow chart and patient characteristics are summarized in Table 1.The mean(s.d.) age was 65.45(7.80)years and half of the patients had a tumour in the mid-thoracic oesophagus.One Thousand Forty-Six (33.2%) patients received neoadjuvant DCF chemotherapy, and adjuvant therapy was given to 398 patients (13.2%).Clinical and pathological disease stages are shown in Table 1.
The 5-year OS and RFS rates were 56.6 (95% c.i. 54.9 to 58.4) and 47.7 (46.0 to 49.5)% respectively (Fig. 1).There were 1713 RFS events, 1444 deaths, and 1441 patients with both OS and RFS censored (Fig. 2).In the primary analysis using the illness-death model-based method at the individual level, a strong correlation between RFS and OS was found (Kendall's τ 0.797, 95% c.i. 0.782 to 0.812) (Table 2).Among the statistical analysis methods, the highest τ value of 0.805 (0.791 to 0.818) was obtained with the IPCW method.
Subgroup analysis showed that an increasing τ value corresponded to a more favourable pathological response (Table 3).The results demonstrated that OS and RFS were more strongly correlated in patients with a better treatment response to neoadjuvant chemotherapy.Moreover, the proportion of patients with equal OS and RFS and non-cancer deaths among all deaths increased as the pathological grade increased.The τ value was 0.721 for the patients who received adjuvant chemotherapy and 0.808 for those who did not (Table S1).In the exploratory analysis, the τ value was −0.025 (−0.335 to 0.285) between pCR and OS, and 0.062 (−0.086 to 0.209) between pathological grade and OS.Similarly, the respective C-index values were 0.521 and 0.575 (Table S2).At the institutional level, the adjusted R 2 was 100 (95% c.i. 40.2 to 100)%; this was obtained using the meta-regression model adjusted for sample size and variability in HR.The equation for the meta-regression model was ln(HR OS ) = 0.868 × ln(HR RFS ) + 0.070.The slope of 0.868 is close to 1, indicating a strong correlation between OS and RFS (Fig. 3).The 95% prediction limits illustrate the expected range of OS effects given specific RFS effects.The STE was defined as the point where the upper prediction limit intersected the horizontal line indicating a HR of 1 for OS (null hypothesis).The STE value was 0.703.Therefore, in future trials with treatment settings similar to those of the present study, a HR for RFS below 0.703 would predict a HR for OS below 1 with 95% probability.

Discussion
The present study has demonstrated a strong correlation between RFS and OS in patients with surgically resectable OSCC.Haslam et al. 35 conducted a large umbrella analysis of surrogate validation studies and reported that most surrogate endpoints in oncology had a low or modest correlation with OS.However, previous studies were solely literature-based and relied on aggregated data, such as HRs.Conducting studies using individual-patient data is crucial because correlations observed at the trial level may not align with those observed at the individual level 17 .To date, there have been few studies on surrogate endpoints using individual-patient data.A study 36 in China investigated surrogate endpoints in 292 patients with advanced OSCC treated with immunochemotherapy, and concluded that the treatment effects for PFS had a weak correlation with the treatment effects for OS.However, the study had a limited sample size.The finding that RFS is a surrogate endpoint for OS in the present study may have implications for future studies as this may accommodate a shorter follow-up to reach the primary endpoint of the study.
An important finding was that OS and RFS were more strongly correlated in patients who responded better to preoperative chemotherapy.In the RCT data set analysed by Weber et al. 26 , the correlation between PFS and OS was stronger in the treatment group than in the control group.They speculated that the possible mechanism could be that intensive therapy reduces the hazard of progression, thereby resulting in an increased proportion of patients with equal PFS and OS, and thus resulting in an increased τ value.This reflects that OS and RFS are congruent when more patients die from non-cancer causes without recurrence.Consistent results were obtained in the present study: the more effective the preoperative chemotherapy, the higher the proportion of non-cancer deaths and the higher the proportion of patients with equal OS and RFS (Table 3).These findings indicate that RFS may be used as a surrogate primary outcome in trials of neoadjuvant treatment modalities leading to high pCR rates.However, studies have reported that the prognosis of pCR differs between neoadjuvant chemotherapy and neoadjuvant chemoradiotherapy 37 .Further research is warranted to determine whether RFS can be applied as a treatment endpoint in modalities other than neoadjuvant chemotherapy.
Kendall's τ value was compared between RFS and OS using four statistical methods.Simulation results showed that the illnessdeath model-based method performed well across several scenarios 26 and was thus used as the primary method in this study.The τ value for the illness-death model was high among four methods and consistent with the simulation results.Simulation results also showed that the IPCW method performed well even with higher censoring rates 26 .Similarly, the τ value calculated using this method was the highest in the present study.The τ value obtained using the other methods were consistently high, indicating the robustness of the study results.
Unlike the results between RFS and OS, the correlation between short-term endpoints (pCR or pathological grade) and OS showed only low τ and C-index values.Petrelli et al. 38 conducted a study using pCR as a surrogate for OS and concluded that there was no correlation between the two endpoints.However, the study calculated the coefficient of determination between the two endpoints using only literature-level ORs and HRs.In the present study, the analysis was robust because a novel statistical method was used to examine the surrogacy of binary variables and OS at the individual level.The data showed an association between these short-term endpoints and OS, but it was not strong enough to discriminate on its own, the C-index being slightly higher than 0.5.Furthermore, the τ value of the binary variable has a theoretical upper limit of 0.75; hence, there is also the problem that a high value cannot be obtained.It was reported previously that pathological grade is a reliable predictor of prognosis 39 , and that endoscopic response is correlated with prognosis 40 .In the future, early postoperative outcomes, such as pCR, may be used as the primary endpoint to  The study has limitations.First, it was retrospective and relied on real-world data, thereby resulting in a substantial proportion of censoring (Fig. 2).Although the estimation of Kendall's τ employed an illness-death-based model that accommodated censoring, further evaluation using individual-patient data in RCTs that typically exhibit lower levels of censoring are needed.Second, the STE for the HR of RFS was 0.703, which is generally considered to be a fairly strict criterion.However, the study was a retrospective simulation of the two neoadjuvant therapy approaches within two treatment arms.Consequently, the quantitative assessment should be interpreted with caution.Furthermore, a separate study 41 comparing the clinical outcomes of CF and of DCF treatments was performed.Finally, only 13.2% of the patients received adjuvant chemotherapy in this study and surrogacy may not be demonstrated in patients who routinely undergo adjuvant therapy.Subgroup analysis was performed exclusively for those having adjuvant chemotherapy; although Kendall's τ value remained high, the results should be interpreted with caution owing to the relatively small sample size.Similarly, immunochemotherapy is emerging in adjuvant and postrelapse treatment and this may require further study.

Institutional-level association between treatment effects
A log scale was used for both axes.Each institution is represented by a bubble of a size proportional to the sample size and HR variability.The curved and dashed lines represent the 95% prediction limit.The straight dashed line represents STE.The 95% prediction limits illustrate the expected range of OS effects given specific RFS effects.The STE was defined as the point where the upper prediction limit intersected the horizontal line indicating a hazard ratio of 1 for OS (null hypothesis).OS, overall survival; RFS, recurrence-free survival; STE, surrogate threshold effect.Equation for meta-regression model: ln(HR OS ) = 0.868 × ln(HR RFS ) + 0.070.Adjusted R 2 : 100 (95% c.i. 40.2 to 100)%.
Fig. 3 Institutional-level association between treatment effects

Table 3 Kendall's τ between recurrence-free and overall survival: subgroup analysis Pathological grade n Patients with equal OS and RFS as a proportion of patients excluding those with both OS and RFS censored Non-cancer death as a proportion of all causes of death †
Values are n (%) unless otherwise indicated; *values in parentheses are 95% confidence intervals.†Excluding patients with unknown cause of death.The illnessdeath model-based method was used.OS, overall survival; RFS, recurrence-free survival.